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Abstract — Efficient optimal prefix coding has long been ac- 
complished via the Huffman algorithm. However, there is still 
room for improvement and exploration regarding variants of 
the Huffman problem. Length-limited Huffman coding, useful 
for many practical applications, is one such variant, in which 
codes are restricted to the set of codes in which none of the n 
codewords is longer than a given length, / max . Binary length- 
limited coding can be done in 0{nl max ) time and 0(n) space 
via the widely used Package-Merge algorithm. In this paper 
the Package-Merge approach is generalized without increasing 
complexity in order to introduce a minimum codeword length, 
l m i n , to allow for objective functions other than the minimization 
of expected codeword length, and to be applicable to both binary 
and nonbinary codes; nonbinary codes were previously addressed 
using a slower dynamic programming approach. These extensions 
have various applications — including faster decompression — 
and can be used to solve the problem of finding an optimal code 
with limited fringe, that is, finding the best code among codes 
with a maximum difference between the longest and shortest 
codewords. The previously proposed method for solving this 
problem was nonpolynomial time, whereas solving this using the 
novel algorithm requires only 0(n(Z max — Z m in) 2 ) time and 0(n) 
space. 

I. Introduction 

A source emits input symbols drawn from the alphabet X = 
{1,2,..., n}, where n is an integer. Symbol i has probability 
Pi, thus defining probability vector p — (pi,f>2, ■ ■ • ,Pn)- Only 
possible symbols are considered for coding and these are, 
without loss of generality, sorted in decreasing order of proba- 
bility; thus pi > and pi < pj for every i > j such that i, j 6 
X. Each input symbol is encoded into a codeword composed 
of output symbols of the D-ary alphabet {0, 1, . . . , D — 1}. 
The codeword c, corresponding to input symbol i has length 
li, thus defining length vector I = (li,h, ■ ■ ■ ,ln)- The code 
should be a prefix code, i.e., no codeword Cj should begin with 
the entirety of another codeword Cj, 

For the bounded-length coding variant of Huffman coding 
introduced here, all codewords must have lengths lying in 
a given interval [7 m i n ,Z max ]. Consider an application in the 
problem of designing a data codec which is efficient in 
terms of both compression ratio and coding speed. Moffat 
and Turpin proposed a variety of efficient implementations 
of prefix encoding and decoding in [1], each involving table 
lookups rather than code trees. They noted that the length 
of the longest codeword should be limited for computational 
efficiency's sake. Computational efficiency is also improved by 
restricting the overall range of codeword lengths, reducing the 
size of the tables and the expected time of searches required for 



decoding. Thus, one might wish to have a minimum codeword 
size of, say, i m ; n = 16 bytes and a maximum codeword size of 
^max = 32 bytes (D = 2). If expected codeword length for an 
optimal code found under these restrictions is too long, Z m i n 
can be reduced and the algorithm rerun until the proper trade- 
off between coding speed and compression ratio is found. 

A similar problem is one of determining opcodes of a 
microprocessor designed to use variable-length opcodes, each 
a certain number of bytes (D = 256) with a lower limit and 
an upper limit to size, e.g., a restriction to opcodes being 16, 
24, or 32 bits long (7 m i n = 2, l max = 4). This problem clearly 
falls within the context considered here, as does the problem 
of assigning video recorder scheduling codes; these human- 
readable decimal codes (D = 10) also have bounds on their 
size, such as l mm = 3 and l max = 8. 

Other problems of interest have l m i n = and are thus 
length limited but have no practical lower bound on length 
[2, p. 396]. Yet other problems have not fixed bounds but a 
constraint on the difference between minimum and maximum 
codeword length, a quantity referred to as fringe [3, p. 121]. 
As previously noted, large fringe has a negative effect of the 
speed of a decoder. 

If we either do not require a minimum or do not require a 
maximum, it is easy to find values for / m ; n or ? max that do not 
limit the problem. As mentioned, setting i m t n = results in a 
trivial minimum, as does = 1. Similarly, setting i max = n 
or using the hard upper bound l max — \{n— 1)/(D— 1)] results 
in a trivial maximum value. 

If both minimum and maximum values are trivial, Huffman 
coding [4] yields a prefix code minimizing expected codeword 
length YliPih- The conditions necessary and sufficient for the 
existence of a prefix code with length vector I are the integer 
constraint, li € Z+, and the Kraft (McMillan) inequality [5], 



i=l 



< 1. 



(1) 



Finding values for I is sufficient to find a corresponding code. 

It is not always obvious that we should minimize the 
expected number of questions J2iPih ( or ' equivalently, the 
expected number of questions in excess of the first i m ; n , 
YliPiih ~ 'min) + > where x + is a; if £ is positive, otherwise). 
We generalize and investigate how to minimize the value 



(2) 
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under the above constraints for any penalty function </?(•) 
convex and increasing on R+. Such an additive measurement 
of cost is called a quasiarithmetic penalty, in this case a convex 
quasiarithmetic penalty. 

One such function ip is ip(5) — (S+l m i n ) 2 , a quadratic value 
useful in optimizing a communications delay problem [6], 
Another function, ip{$) = for t > 0, can be used 

to minimize the probability of buffer overflow in a queueing 
system [7]. 

Mathematically stating the bounded-length problem, 

Given p = (pi, . . . ,p n ), p,- > 0; 

DG{2,3,...}; 

convex, monotonically increasing 

ip : R + -> R + 
Minimize m Y,i Pi<p{h ~ ^min) 



'{1} 

subject to J2i D ~ 1 ' < !; 

^2 £ {^mini ^min 
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Note that we need not assume that probabilities p L sum to 1; 
they could instead be arbitrary positive weights. 

Given a finite n-symbol input alphabet with an associated 
probability vector p, a D-symbol output alphabet with code- 
words of lengths [Z m i n , Z max ] allowed, and a constant-time 
calculable penalty function ip, we describe an 0(n(Z max — 
Z m i n ))-time 0(rt)-space algorithm for constructing a (^-optimal 
code. In Section [TTJ we present a brief review of the relevant 
literature before extending to D-ary codes a notation first 
presented in [6]. This notation aids in solving the problem 
in question by reformulating it as an instance of the D- 
ary Coin Collector's problem, presented in the section as an 
extension of the original (binary) Coin Collector's problem 
[8]. An extension of the Package-Merge algorithm solves this 
problem; we introduce the reduction and resulting algorithm in 
Section Hill An application to a previously proposed problem 
involving tree fringe is discussed in Section [IV] 

II. Preliminaries 

Reviewing how the problem in question differs from binary 
Huffman coding: 

1) It can be nonbinary, a case considered by Huffman in 
his original paper [4]; 

2) There is a maximum codeword length, a restriction 
previously considered, e.g., [9] in 0(rt 3 Z max log D) time 
[10] and 0(n 2 \ogD) space, but solved efficiently only 
for binary coding, e.g., [8] in 0(nl max ) time 0(n) space 
and most efficiently in [11]; 

3) There is a minimum codeword length, a novel restric- 
tion; 

4) The penalty can be nonlinear, a modification previously 
considered, but only for binary coding, e.g., [12]. 

The minimum size constraint on codeword length requires 
a relatively simple change of solution range to [8]. The nonbi- 
nary coding generalization is a bit more involved; it requires 
first modifying the Package-Merge algorithm to allow for an 
arbitrary numerical base (binary, ternary, etc.), then modifying 
the coding problem to allow for a provable reduction to the 



modified Package-Merge algorithm. The 0(n(Z max — ^ m i n ))- 
time 0(n)-space algorithm minimizes height (that is, max- 
imum codeword length) among optimal codes (if multiple 
optimal codes exist). 

Before presenting an algorithm for optimizing the above 
problem, we introduce a notation for codes that generalizes 
one first presented in [6] and modified in [12]. 

The key idea: Each node (i,l) represents both the share of 
the penalty (0 (weight) and the (scaled) share of the Kraft 
sum (Q} (width) assumed for the Zth bit of the ith codeword. 
By showing that total weight is an increasing function of the 
penalty and that there is a one-to-one correspondence between 
an optimal code and a corresponding optimal nodeset, we 
reduce the problem to an efficiently solvable problem, the Coin 
Collector's problem. 

In order to do this, we first need to make a modification to 
the problem analogous to one Huffman made in his original 
nonbinary solution. We must in some cases add a "dummy" in- 
put or "dummy" inputs of probability pi — to the probability 
vector to assure that the optimal code has the Kraft inequality 
satisfied with equality, an assumption underlying both the 
Huffman algorithm and ours. If we use the minimum number 
of dummy inputs needed to make n mod (D — 1) = 1, we 
can assume without loss of generality that k(1) = 1. With this 
modification, we present nodeset notation: 

Definition 1: A node is an ordered pair of integers (i, I) 
such that i e {1, . . . , n} and / e {i m in + 1, . . . , i m ax}- Call 
the set of all possible nodes /. This set can be arranged in an 
nx (/max - ^min) grid, e.g., Fig. 03 The set of nodes, or nodeset, 
corresponding to input symbol i (assigned codeword c, with 
length U) is the set of the first — Z m j n nodes of column i, 
that is, 7ji(i) = {(J, l)\j = i, I E {l m in + !,•■-, k}}- The 
nodeset corresponding to length vector I is r/(l) = Ui^iW^ 
this corresponds to a set of n codewords, a code. Thus, in 
Fig. [U the dashed line surrounds a nodeset corresponding to 
I = (1, 2, 2, 2, 2, 2, 2). We say a node (i, I) has width p(i, I) = 
D~ l and weight fx{i, I) = pnp(l - l min ) - Pi<p(l - ?min - 1), 
as shown in the example in Fig. Q] Note that if ip(l) = I, 
fi(i,l) =pi. 

Given valid nodeset AT C I, it is straightforward to find 
the corresponding length vector and, if it satisfies the Kraft 
inequality, a code. 

We find an optimal nodeset using the D-ary Coin Collector's 
problem. Let D z denote the set of all integer powers of a 
fixed integer D > 1. The Coin Collector's problem of size m 
considers "coins" indexed by i S {1, 2, . . . , m}. Each coin has 
a width, pi e D z ; one can think of width as coin face value, 
e.g., pi = 0.25 = 2~ 2 for a quarter dollar (25 cents). Each 
coin also has a weight, pi 6 M. The final problem parameter 
is total width, denoted p to t- The problem is then: 



Minimize {sc{i, 
subject to 
where 



m E Z+,p,i £ K 

Pi eD z , Ptot eR+. 



(3) 



We thus wish to choose coins with total width p to t suc h that 
their total weight is as small as possible. This problem has 
a linear-time solution given sorted inputs; this solution was 
found for D = 2 in [8] and is found for D > 2 here. 

Let i E {1, . . . , to} denote both the index of a coin and the 
coin itself, and let X represent the to items along with their 
weights and widths. The optimal solution, a function of total 
width p tot and items X, is denoted CC(X, p tot ) (the optimal 
coin collection for X and p to t)- Note that, due to ties, this 
need not be a unique solution, but the algorithm proposed here 
is deterministic; that is, it finds one specific solution, much 
like bottom-merge Huffman coding [13] or the corresponding 
length-limited problem [12], [14] 

Because we only consider cases in which a solution exists, 
ptot = w/9 pow for some p pow e D z and uj e Z+. Here, 
assuming p to t > 0, p pow and w are the unique pair of a power 
of D and an integer that is not a multiple of D, respectively, 
which, multiplied, form p tot . If p tot = 0, u) and p pow are not 
used. Note that p pow need not be an integer. 

Algorithm variables 

At any point in the algorithm, given nontrivial X and p to t, we 
use the following definitions: 
Remainder 

Pp OW = the unique x 6 D 1 

such that G Z\DZ 



Minimum width 

P* 

Small width set 

I* 

"First" item 



"First" package 



miriieip, (g £> z ) 
{i | Pl =p*}(^ 0) 

arg mm iex ,pi 

(ties broken w/highest index) 



V such that 

\V\=D, 

per, 

V<T*\P, \X*\>D 
0, < D 

(ties broken w/highest indices) 

where DZ denotes integer multiples of D and V < X*\V 
denotes that, for a\\ i E V and j G J*\7 ? , m < pj. Then the 
following is a recursive description of the algorithm: 

Recursive £>-ary Package-Merge Procedure 

Basis. p tot = 0: CC(T,p to t) = 0. 

Ca.se 1. p* = /9p OW and J ^ 0: CC(T,p to t) = 
CC(X\{i*},p tot -p*)U{i*}. 

Case 2a. p* < ppow, T ± 0, |X*| < £>: CC(J, p tot ) = 
CC(J\X*,p tot ). 

Case 2b. p* < p pow , 1^1, and > D: Create i', a 
new item with weight pi> = X^e-p* M« anc l width p v = Dp*. 
This new item is thus a combined item, or package, formed 
by combining the D least weighted items of width p*. Let 
S = CC(X\V* U {i'}, ptot) (the optimization of the packaged 



version, where the package is given a low index so that, 
if "repackaged," this occurs after all singular or previously 
packaged items of identical weight and width). If i' G S, then 
CC(J,p tot ) = <5\{i'} U V*; otherwise, CC(X,p tot ) = S. 

Theorem 1: If an optimal solution to the Coin Collector's 
problem exists, the above recursive (Package-Merge) algo- 
rithm will terminate with an optimal solution. 

Proof: Using induction on the number of input items, 
while the basis is trivially correct, each inductive case reduces 
the number of items by at least one. The inductive hypothesis 
on ptot > and X ^ is that the algorithm is correct for 
any problem instance with fewer input items than instance 

(X,Ptot)- 

If p* > ppow > 0, or if X — and p tot ^ 0, then there is 
no solution to the problem, contrary to our assumption. Thus 
all feasible cases are covered by those given in the procedure. 
Case 1 indicates that the solution must contain at least one 
element (item or package) of width p*. These must include 
the minimum weight item in I*, since otherwise we could 
substitute one of the items with this "first" item and achieve 
improvement. Case 2 indicates that the solution must contain 
a number of elements of width p* that is a multiple of D. If 
this number is 0, none of the items in V* are in the solution. 
If it is not, then they all are. Thus, if V* = 0, the number is 
0, and we have Case 2a. If not, we may "package" the items, 
considering the replaced package as one item, as in Case 2b. 
Thus the inductive hypothesis holds. ■ 

The algorithm can be performed in linear time and space, 
as with the binary version [8]. 

III. A General Algorithm 

Theorem 2: The solution N of the Package-Merge algo- 
rithm for X = I and 



Ptot — 



n - £>'- 
D - 1 



-D~ 



has a corresponding length vector l N such that N = rj(l N ) 
and p(N) = minj J2iPMk - 'min) - <fi(0)Y^iPi- 

A formal proof can be found in the full version at [15]. The 
idea is to show that, if there is an (i, I) G N with I G [Z m i n + 
2, /max] such that — 1) G I\N, one can strictly decrease 
the penalty by substituting item — 1) for a set of items 
including showing the suboptimality of N. Conversely, 

if there is no such (i, I), optimal N corresponds to an optimal 
length vector. 

Because the Coin Collector's problem is linear in time and 
space — same-width inputs are presorted by weight, numerical 
operations and comparisons are constant time — the overall 
algorithm finds an optimal code in 0(|X|) = 0(n(/ max — Z m in)) 
time and space. Space complexity, however, can be lessened. 
This is because the algorithm output is a monotonic nodeset: 

Definition 2: A monotonic nodeset, N, is one with the 
following properties: 



G N => (i + 1,1) G N for i < n 
G 1) G N forZ>Z m 



1. 



(4) 
(5) 



1 2 3 4 5 6 7 

i (input symbol) 

Fig. 1. The set of nodes / with widths {p(i, I)} and weights I)} for <p(8) = S 2 , n = 7, D = 3, l m i n = 1, 



In other words, a nodeset is monotonic if and only if it corre- 
sponds to a length vector I with lengths sorted in increasing 
order; this definition is equivalent to that given in [8]. 

While not all optimal codes are monotonic, using the 
aforementioned tie-breaking techniques, the algorithm always 
results in a monotonic code, one that has minimum maximum 
length among all monotonic optimal codes. Examples of 
monotonic nodesets include the sets of nodes enclosed by 
dashed lines in Fig. Q] and Fig. [2] In the latter case, n = 21, 
D = 3, l min = 2, and l max = 8, so p tot = 2/3. 

In [8], monotonicity allows trading off a constant factor 
of time for drastically reduced space complexity for length- 
limited binary codes. We extend this to the bounded-length 
problem. Note that the total width of items that are each less 
than or equal to width p is less than 2np. Thus, when we 
are processing items and packages of width p, fewer than 2n 
packages are kept in memory. The key idea in reducing space 
complexity is to keep only four attributes of each package in 
memory instead of the full contents. In this manner, we use 
0(n) space while retaining enough information to reconstruct 
the optimal nodeset in algorithmic postprocessing. 

Package attributes allow us to divide the problem into two 
subproblems with total complexity that is at most half that of 
the original problem. Define 



For each package S, we retain only the following attributes: 

1) P(S) = E(i,i)esM*>0 

2) P(S) = E(i,j)esP(i>0 

3) u(S) = |sn/ mid | 

4) i>(S) = E w) esn/ hl KM) 

where I hi = | I > Z mid } and I mid = | I = Z mid }. 

We also define I\ = {(i, I) | I < Z m id}- 

With only these parameters, the "first run" of the algorithm 
takes 0(n) space. The output of this run is the package 



attributes of the optimal nodeset N. Thus, at the end of this 
first run, we know the value for n v = v{N), and we can 
consider N as the disjoint union of four sets, shown in Fig. [2] 

1) A = nodes in N n h Q with indexes in [1, n — n v \, 

2) B = nodes in N n I\ with indexes in [n — n u + 1, n], 

3) r = nodes in N ("1 / mid , 

4) A = nodes in N n Ihi- 

Due to the monotonicity of N, it is clear that B = [n — n v + 



1, n] x 



1] and r = [n — n v + l,n] x {/ m ; d }. 



Note then that p(B) = (n v ){D- 1 ™™ - D 1 ^ l ^ id )/(D - 1) and 
p(r) = n v D~ lraid . Thus we need merely to recompute which 
nodes are in A and in A. 

Because A is a subset of /hi, p(A) = ip(N) and p(A) — 
p(N)—p(B)—p(r)—p(A). Given their respective widths, A is 
a minimal weight subset of [1, n — n v \ x [Z m i n + 1, i m id — 1] and 
A is a minimal weight subset of [n—n„+l, n] x [Z m i d +1, Z m ax]- 
These will be monotonic if the overall nodeset is monotonic. 
The nodes at each level of A and A can thus be found by 
recursive calls to the algorithm. This approach uses only 0(n) 
space while preserving time complexity as in [8]. 

There are changes we can make to the algorithm that, for 
certain inputs, will result in even better performance. For 
example, if l max w ^og D n, then, rather than minimizing 
the weight of nodes of a certain total width, it is easier to 
maximize weight and find the complementary set of nodes. 
Similarly, if most input symbols have one of a handful of 
probability values, one can consider this and simplify calcula- 
tions. These and other similar optimizations have been done in 
the past for the special case <p(5) — 5, i m j n = 0, D = 2 [16]- 
[20], though we will not address or extend such improvements 
here. 

Note also that there are cases in which we can find a better 
upper bound for codeword length than l max or a better lower 
bound than / m ; n . In such cases, complexity is accordingly 
reduced, and, when Z max is effectively trivial (e.g., Z max = 
n — 1), and the Package-Merge approach can be replaced 



/ (level) 

^min ~"t" 1 



p (width) 




n — n v 
i (input symbol) 



Fig. 2. The set of nodes /, an optimal nodeset N, and disjoint subsets A, B, r, A 



by conventional (linear-time) Huffman coding approaches. 
Likewise, when ip(8) = 5 and l max — Z m ; n is not O(logn), 
an approach similar to that of [21] as applied in [11] has 
better asymptotic performance. These alternative approaches 
are omitted due to space and can be found at [15]. 

IV. Fringe-limited Prefix Coding 

An important problem that can be solved with the tech- 
niques in this paper is that of finding an optimal code given 
an upper bound on fringe, the difference between minimum 
and maximum codeword length; such a problem is proposed 
in [3, p. 121], where it is suggested that if there are b — 1 
codes better than the best code with fringe at most d, one 
can find this 6-best code with the 0(6n 3 )-time algorithm in 
[22, pp. 890-891], thus solving the fringe-limited problem. 
However, this presumes we know an upper bound for b before 
running this algorithm. More importantly, if a probability 
vector is far from uniform, b can be very large, since the 
number of viable code trees is 0(1.794 . . .") [23], [24]. Thus 
this is a poor approach in general. Instead, we can use the 
aforementioned algorithms for finding the optimal bounded- 
length code with codeword lengths restricted to [I' — d, I'} for 
each I' e{ \\og D n] , [log D n] +1, . . . , Llog^, n\ +d}, keeping 
the best of these codes; this covers all feasible cases of fringe 
upper bounded by d. (Here we again assume, without loss of 
generality, that n mod [D — 1) = 1.) The overall procedure 
thus has time complexity 0(nd 2 ) and 0(n) space complexity. 
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